Methods and Apparatus to Perform Audio Watermarking and Watermark Detection and Extraction

ABSTRACT

Methods and apparatus to audio watermarking and watermark detection and extracted are described herein. An example method includes receiving a media content signal, sampling the media content signal to generate samples, storing the samples in a buffer, determining a first sequence of samples in the buffer, determining a second sequence of samples in the buffer, wherein the second sequence of samples is of substantially equal length as the first sequence of samples, calculating an average of the first sequence of samples and the second sequence of samples to generate an average sequence of samples, extracting an identifier from the average sequence of samples, and storing the identifier in a tangible memory.

RELATED APPLICATIONS

This patent is a continuation of U.S. patent application Ser. No.12/551,220, filed Aug. 31, 2009, entitled “Methods and Apparatus toPerform Audio Watermarking and Watermark Detection and Extraction,”which is a non-provisional of U.S. Provisional Application Ser. No.61/174,708 filed May 1, 2009, entitled “METHODS AND APPARATUS TO PERFORMAUDIO WATERMARKING AND WATERMARK DETECTION AND EXTRACTION” and anon-provisional of U.S. Provisional Application Ser. No. 61/108,380,filed Oct. 24, 2008, entitled “STACKING METHOD FOR ADVANCED WATERMARKDETECTION.” The disclosures of U.S. patent application Ser. No.12/551,220, U.S. Provisional Application Ser. No. 61/174,708, and U.S.Provisional Application Ser. No. 61/108,380 are incorporated byreference in their entirety.

TECHNICAL FIELD

The present disclosure relates generally to media monitoring and, moreparticularly, to methods and apparatus to perform audio watermarking andwatermark detection and extraction.

BACKGROUND

Identifying media information and, more specifically, audio streams(e.g., audio information) is useful for assessing audience exposure totelevision, radio, or any other media. For example, in televisionaudience metering applications, a code may be inserted into the audio orvideo of media, wherein the code is later detected at monitoring siteswhen the media is presented (e.g., played at monitored households). Theinformation payload of the code/watermark embedded into original signalcan consist of unique source identification, time of broadcastinformation, transactional information or additional content metadata.

Monitoring sites typically include locations such as, for example,households where the media consumption of audience members or audiencemember exposure to the media is monitored. For example, at a monitoringsite, codes from the audio and/or video are captured and may beassociated with audio or video streams of media associated with aselected channel, radio station, media source, etc. The collected codesmay then be sent to a central data collection facility for analysis.However, the collection of data pertinent to media exposure orconsumption need not be limited to in-home exposure or consumption.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic depiction of a broadcast audience measurementsystem employing a program identifying code added to the audio portionof a composite television signal.

FIG. 2 is a block diagram of an example encoder of FIG. 1.

FIG. 3 is a flow diagram illustrating an example encoding process thatmay be carried out by the example decoder of FIG. 2.

FIG. 4 is a flow diagram illustrating an example process that may becarried to generate a frequency index table used in conjunction with thecode frequency selector of FIG. 2.

FIG. 5 is a chart illustrating critical band indices and how theycorrespond to short and long block sample indices.

FIG. 6 illustrates one example of selecting frequency components thatwill represent a particular information symbol.

FIGS. 7-9 are charts illustrating different example code frequencyconfigurations that may be generated by the process of FIG. 4 and usedin conjunction with the code frequency selector of FIG. 2.

FIG. 10 illustrates the frequency relationship between the audioencoding indices.

FIG. 11 is a block diagram of the example decoder of FIG. 1.

FIG. 12 is a flow diagram illustrating an example decoding process thatmay be carried out by the example encoder of FIG. 11.

FIG. 13 is a flow diagram of an example process that may be carried outto stack audio in the decoder of FIG. 11.

FIG. 14 is a flow diagram of an example process that may be carried outto determine a symbol encoded in an audio signal in the decoder of FIG.11.

FIG. 15 is a flow diagram of an example process that may be carried outto process a buffer to identify messages in the decoder of FIG. 11.

FIG. 16 illustrates an example set of circular buffers that may storemessage symbols.

FIG. 17 illustrates an example set of pre-existing code flag circularbuffers that may store message symbols.

FIG. 18 is a flow diagram of an example process that may be carried outto validate identified messages in the decoder of FIG. 11.

FIG. 19 illustrates an example filter stack that may store identifiedmessages in the decoder of FIG. 11.

FIG. 20 is a schematic illustration of an example processor platformthat may be used and/or programmed to perform any or all of theprocesses or implement any or all of the example systems, exampleapparatus and/or example methods described herein.

DETAILED DESCRIPTION

The following description makes reference to audio encoding and decodingthat is also commonly known as audio watermarking and watermarkdetection, respectively. It should be noted that in this context, audiomay be any type of signal having a frequency falling within the normalhuman audibility spectrum. For example, audio may be speech, music, anaudio portion of an audio and/or video program or work (e.g., atelevision program, a movie, an Internet video, a radio program, acommercial spot, etc.), a media program, noise, or any other sound.

In general, as described in detail below, the encoding of the audioinserts one or more codes or information (e.g., watermarks) into theaudio and, ideally, leaves the code inaudible to hearers of the audio.However, there may be certain situations in which the code may beaudible to certain listeners. The codes that are embedded in audio maybe of any suitable length and any suitable technique for assigning thecodes to information may be selected.

As described below, the codes or information to be inserted into theaudio may be converted into symbols that will be represented by codefrequency signals to be embedded in the audio to represent theinformation. The code frequency signals include one or more codefrequencies, wherein different code frequencies or sets of codefrequencies are assigned to represent different symbols of information.Techniques for generating one or more tables mapping symbols torepresentative code frequencies such that symbols are distinguishablefrom one another at the decoder are also described. Any suitableencoding or error correcting technique may be used to convert codes intosymbols.

By controlling the amplitude at which these code frequency signals areinput into the native audio, the presence of the code frequency signalscan be imperceptible to human hearing. Accordingly, in one example,masking operations based on the energy content of the native audio atdifferent frequencies and/or the tonality or noise-like nature of thenative audio are used to provide information upon which the amplitude ofthe code frequency signals is based.

Additionally, it is possible that an audio signal has passed through adistribution chain, where, for example, the content has passed from acontent originator to a network distributor (e.g., NBC national) andfurther passed to a local content distributor (e.g., NBC in Chicago). Asthe audio signal passes through the distribution chain, one of thedistributors may encode a watermark into the audio signal in accordancewith the techniques described herein, thereby including in the audiosignal an indication of that distributors identity or the time ofdistribution. The encoding described herein is very robust and,therefore, codes inserted into the audio signal are not easily removed.Accordingly, any subsequent distributors of the audio content may usetechniques described herein to encode the previously encoded audiosignal in a manner such that the code of the subsequent distributor willbe detectable and any crediting due that subsequent distributor will beacknowledged.

Additionally, due to the repetition or partial repetition of codeswithin a signal, code detection can be improved by stacking messages andtransforming the encoded audio signal into a signal having anaccentuated code. As the audio signal is sampled at a monitoredlocation, substantially equal sized blocks of audio samples are summedand averaged. This stacking process takes advantage of the temporalproperties of the audio signal to cause the code signal to beaccentuated within the audio signal. Accordingly, the stacking process,when used, can provide increased robustness to noise or otherinterference. For example, the stacking process may be useful when thedecoding operation uses a microphone that might pick up ambient noise inaddition to the audio signal output by a speaker.

A further technique to add robustness to the decoding operationsdescribed herein provides for validation of messages identified by adecoding operation. After messages are identified in an encoded audiosignal, the messages are added to a stack. Subsequent repetitions ofmessages are then compared to identify matches. When a message can bematched to another message identified at the proper repetition interval,the messages are identified as validated. When a message can bepartially matched to another message that has already been validated,the message is marked as partially validated and subsequent messages areused to identify parts of the message that may have been corrupted.According to this example validation technique, messages are only outputfrom the decoder if they can be validated. Such a technique preventserrors in messages caused by interference and/or detection errors.

The following examples pertain generally to encoding an audio signalwith information, such as a code, and obtaining that information fromthe audio via a decoding process. The following example encoding anddecoding processes may be used in several different technicalapplications to convey information from one place to another.

The example encoding and decoding processes described herein may be usedto perform broadcast identification. In such an example, before a workis broadcast, that work is encoded to include a code indicative of thesource of the work, the broadcast time of the work, the distributionchannel of the work, or any other information deemed relevant to theoperator of the system. When the work is presented (e.g., played througha television, a radio, a computing device, or any other suitabledevice), persons in the area of the presentation are exposed not only tothe work, but, unbeknownst to them, are also exposed to the codeembedded in the work. Thus, persons may be provided with decoders thatoperate on a microphone-based platform so that the work may be obtainedby the decoder using free-field detection and processed to extract codestherefrom. The codes may then be logged and reported back to a centralfacility for further processing. The microphone-based decoders may bededicated, stand-alone devices, or may be implemented using cellulartelephones or any other types of devices having microphones and softwareto perform the decoding and code logging operations. Alternatively,wire-based systems may be used whenever the work and its attendant codemay be picked up via a hard wired connection.

The example encoding and decoding processes described herein may beused, for example, in tracking and/or forensics related to audio and/orvideo works by, for example, marking copyrighted audio and/or associatedvideo content with a particular code. The example encoding and decodingprocesses may be used to implement a transactional encoding system inwhich a unique code is inserted into a work when that work is purchasedby a consumer. Thus, allowing a media distribution to identify a sourceof a work. The purchasing may include a purchaser physically receiving atangible media (e.g., a compact disk, etc.) on which the work isincluded, or may include downloading of the work via a network, such asthe Internet. In the context of transactional encoding systems, eachpurchaser of the same work receives the work, but the work received byeach purchaser is encoded with a different code. That is, the codeinserted in the work may be personal to the purchaser, wherein each workpurchased by that purchaser includes that purchaser's code.Alternatively, each work may be may be encoded with a code that isserially assigned.

Furthermore, the example encoding and decoding techniques describedherein may be used to carry out control functionality by hiding codes ina steganographic manner, wherein the hidden codes are used to controltarget devices programmed to respond to the codes. For example, controldata may be hidden in a speech signal, or any other audio signal. Adecoder in the area of the presented audio signal processes the receivedaudio to obtain the hidden code. After obtaining the code, the targetdevice takes some predetermined action based on the code. This may beuseful, for example, in the case of changing advertisements withinstores based on audio being presented in the store, etc. For example,scrolling billboard advertisements within a store may be synchronized toan audio commercial being presented in the store through the use ofcodes embedded in the audio commercial.

An example encoding and decoding system 100 is shown in FIG. 1. Theexample system 100 may be, for example, a television audiencemeasurement system, which will serve as a context for furtherdescription of the encoding and decoding processes described herein. Theexample system 100 includes an encoder 102 that adds a code orinformation 103 to an audio signal 104 to produce an encoded audiosignal. The information 103 may be any selected information. Forexample, in a media monitoring context, the information 103 may berepresentative of an identity of a broadcast media program such as atelevision broadcast, a radio broadcast, or the like. Additionally, theinformation 103 may include timing information indicative of a time atwhich the information 103 was inserted into audio or a media broadcasttime. Alternatively, the code may include control information that isused to control the behavior of one or more target devices.

The audio signal 104 may be any form of audio including, for example,voice, music, noise, commercial advertisement audio, audio associatedwith a television program, live performance, etc. In the example of FIG.1, the encoder 102 passes the encoded audio signal to a transmitter 106.The transmitter 106 transmits the encoded audio signal along with anyvideo signal 108 associated with the encoded audio signal. While, insome instances, the encoded audio signal may have an associated videosignal 108, the encoded audio signal need not have any associated video.

In one example, the audio signal 104 is a digitized version of an analogaudio signal, wherein the analog audio signal has been sampled at 48kilohertz (KHz). As described below in detail, two seconds of audio,which correspond to 96,000 audio samples at the 48 KHz sampling rate,may be used to carry one message, which may be a synchronization messageand 49 bits of information. Using an encoding scheme of 7 bits persymbol, the message requires transmission of eight symbols ofinformation. Alternatively, in the context of overwriting describedbelow, one synchronization symbol is used and one information symbolconveying one of 128 states follows the synchronization symbol. Asdescribed below in detail, according to one example, one 7-bit symbol ofinformation is embedded in a long block of audio samples, whichcorresponds to 9216 samples. In one example, such a long block includes36 overlapping short blocks of 256 samples, wherein in a 50% overlappingblock 256 of the samples are old and 256 samples are new.

Although the transmit side of the example system 100 shown in FIG. 1shows a single transmitter 106, the transmit side may be much morecomplex and may include multiple levels in a distribution chain throughwhich the audio signal 104 may be passed. For example, the audio signal104 may be generated at a national network level and passed to a localnetwork level for local distribution. Accordingly, although the encoder102 is shown in the transmit lineup prior to the transmitter 106, one ormore encoders may be placed throughout the distribution chain of theaudio signal 104. Thus, the audio signal 104 may be encoded at multiplelevels and may include embedded codes associated with those multiplelevels. Further details regarding encoding and example encoders areprovided below.

The transmitter 106 may include one or more of a radio frequency (RF)transmitter that may distribute the encoded audio signal through freespace propagation (e.g., via terrestrial or satellite communicationlinks) or a transmitter used to distribute the encoded audio signalthrough cable, fiber, etc. In one example, the transmitter 106 may beused to broadcast the encoded audio signal throughout a broadgeographical area. In other cases, the transmitter 106 may distributethe encoded audio signal through a limited geographical area. Thetransmission may include up-conversion of the encoded audio signal toradio frequencies to enable propagation of the same. Alternatively, thetransmission may include distributing the encoded audio signal in theform of digital bits or packets of digital bits that may be transmittedover one or more networks, such as the Internet, wide area networks, orlocal area networks. Thus, the encoded audio signal may be carried by acarrier signal, by information packets or by any suitable technique todistribute the audio signals.

When the encoded audio signal is received by a receiver 110, which, inthe media monitoring context, may be located at a statistically selectedmetering site 112, the audio signal portion of the received programsignal is processed to recover the code, even though the presence ofthat code is imperceptible (or substantially imperceptible) to alistener when the encoded audio signal is presented by speakers 114 ofthe receiver 110. To this end, a decoder 116 is connected eitherdirectly to an audio output 118 available at the receiver 110 or to amicrophone 120 placed in the vicinity of the speakers 114 through whichthe audio is reproduced. The received audio signal can be either in amonaural or stereo format. Further details regarding decoding andexample decoders are provided below.

Audio Encoding

As explained above, the encoder 102 inserts one or more inaudible (orsubstantially inaudible) codes into the audio 104 to create encodedaudio. One example encoder 102 is shown in FIG. 2. In oneimplementation, the example encoder 102 of FIG. 2 may be implementedusing, for example, a digital signal processor programmed withinstructions to implement an encoding lineup 202, the operation of whichis affected by the operations of a prior code detector 204 and a maskinglineup 206, either or both of which can be implemented using a digitalsignal processor programmed with instructions. Of course, any otherimplementation of the example encoder 102 is possible. For example, theencoder 102 may be implemented using one or more processors,programmable logic devices, or any suitable combination of hardware,software, and firmware.

In general, during operation, the encoder 102 receives the audio 104 andthe prior code detector 204 determines if the audio 104 has beenpreviously encoded with information, which will make it difficult forthe encoder 102 to encode additional information into the previouslyencoded audio. For example, a prior encoding may have been performed ata prior location in the audio distribution chain (e.g., at a nationalnetwork level). The prior code detector 204 informs the encoding lineup202 as to whether the audio has been previously encoded. The prior codedetector 204 may be implemented by a decoder as described herein.

The encoding lineup 202 receives the information 103 and produces codefrequency signals based thereon and combines the code frequency signalwith the audio 104. The operation of the encoding lineup 202 isinfluenced by the output of the prior code detector 204. For example, ifthe audio 104 has been previously encoded and the prior code detector204 informs the encoding lineup 202 of this fact, the encoding lineup202 may select an alternate message that is to be encoded in the audio104 and may also alter the details by which the alternate message isencoded (e.g., different temporal location within the message, differentfrequencies used to represent symbols, etc.).

The encoding lineup 202 is also influenced by the masking lineup 206. Ingeneral, the masking lineup 206 processes the audio 104 corresponding tothe point in time at which the encoding lineup 202 wants to encodeinformation and determines the amplitude at which the encoding should beperformed. As described below, the masking lineup 206 may output asignal to control code frequency signal amplitudes to keep the codefrequency signal below the threshold of human perception.

As shown in the example of FIG. 2, the encoding lineup includes amessage generator 210, a symbol selector 212, a code frequency selector214, a synthesizer 216, an inverse Fourier transform 218, and a combiner220. The message generator 210 is responsive to the information 103 andoutputs messages having the format generally shown at reference numeral222. The information 103 provided to the message generator may be thecurrent time, a television or radio station identification, a programidentification, etc. In one example, the message generator 210 mayoutput a message every two seconds. Of course, other messaging intervalsare possible.

In one example, the message format 222 representative of messages outputfrom the message generator 210 includes a synchronization symbol 224.The synchronization symbol 224 is used by decoders, examples of whichare described below, to obtain timing information indicative of thestart of a message. Thus, when a decoder receives the synchronizationsymbol 224, that decoder expects to see additional information followingthe synchronization symbol 224.

In the example message format 222 of FIG. 2, the synchronization symbol224, is followed by 42 bits of message information 226. This informationmay include a binary representation of a station identifier and coarsetiming information. In one example, the timing information representedin the 42 bits of message information 226 changes every 64 seconds, or32 message intervals. Thus, the 42 bits of message information 226remain static for 64 seconds. The seven bits of message information 228may be high resolution time that increments every two seconds.

The message format 222 also includes pre-existing code flag information230. However, the pre-existing code flag information 230 is onlyselectively used to convey information. When the prior code detector 204informs the message generator 210 that the audio 104 has not beenpreviously encoded, the pre-existing code flag information 230 is notused. Accordingly, the message output by the message generator onlyincludes the synchronization symbol 224, the 42 bits of messageinformation 226, and the seven bits of message information 228; thepre-existing code flag information 230 is blank or filled by unusedsymbol indications. In contrast, when the prior code detector 204provides to the message generator 210 an indication that the audio 104into which the message information is to be encoded has previously beenencoded, the message generator 210 will not output the synchronizationsymbol 224, the 42 bits of message information 226, or the seven bits ofmessage information 228. Rather, the message generator 210 will utilizeonly the pre-existing code flag information 230. In one example, thepre-existing code flag information will include a pre-existing code flagsynchronization symbol to signal that pre-existing code flag informationis present. The pre-existing code flag synchronization symbol isdifferent from the synchronization symbol 224 and, therefore, can beused to signal the start of pre-existing code flag information. Uponreceipt of the pre-existing code flag synchronization symbol, a decodercan ignore any prior-received information that aligned in time with asynchronization symbol 224, 42 bits of message information 226, or sevenbits of message information 228. To convey information, such as achannel indication, a distribution identification, or any other suitableinformation, a single pre-existing code flag information symbol followsthe pre-existing code flag synchronization symbol. This pre-existingcode flag information may be used to provide for proper crediting in anaudience monitoring system.

The output from the message generator 210 is passed to the symbolselector 212, which selects representative symbols. When thesynchronization symbol 224 is output, the symbol selector may not needto perform any mapping because the synchronization symbol 224 is alreadyin symbol format. Alternatively, if bits of information are output fromthe message generator 210, the symbol selector may use straight mapping,wherein, for example seven bits output from the message generator 210are mapped to a symbol having the decimal value of the seven bits. Forexample, if a value of 1010101 is output from the message generator 210,the symbol selector may map those bits to the symbol 85. Of course otherconversions between bits and symbols may be used. In certain examples,redundancy or error encoding may be used in the selection of symbols torepresent bits. Additionally, any other suitable number of bits thanseven may be selected to be converted into symbols. The number of bitsused to select the symbol may be determined based on the maximum symbolspace available in the communication system. For example, if thecommunication system can only transmit one of four symbols at a time,then only two bits from the message generator 210 would be convertedinto symbols at a time.

The symbols from the symbol selector 212 are passed to the codefrequency selector 214 that selects code frequencies that are used torepresent the symbol. The symbol selector 212 may include one or morelook up tables (LUTs) 232 that may be used to map the symbols into codefrequencies that represent the symbols. That is, a symbol is representedby a plurality of code frequencies that the encoder 102 emphasizes inthe audio to form encoded audio that is transmitted. Upon receipt of theencoded audio, a decoder detects the presence of the emphasized codefrequencies and decodes the pattern of emphasized code frequencies intothe transmitted symbol. Thus, the same LUT selected at the encoder 210for selecting the code frequencies needs to be used in the decoder. Oneexample LUT is described in conjunction with FIGS. 3-5. Additionally,example techniques for generating LUTs are provided in conjunction withFIGS. 7-9.

The code frequency selector 214 may select any number of different LUTsdepending of various criteria. For example, a particular LUT or set ofLUTs may be used by the code frequency selector 214 in response to theprior receipt of a particular synchronization symbol. Additionally, ifthe prior code detector 204 indicates that a message was previouslyencoded into the audio 104, the code frequency selector 214 may select alookup table that is unique to pre-existing code situations to avoidconfusion between frequencies used to previously encode the audio 104and the frequencies used to include the pre-existing code flaginformation.

An indication of the code frequencies that are selected to represent aparticular symbol is provided to the synthesizer 216. The synthesizer216 may store, for each short block constituting a long block, threecomplex Fourier coefficients representative of each of the possible codefrequencies that the code frequency selector 214 will indicate. Thesecoefficients represent the transform of a windowed sinusoidal codefrequency signal whose phase angle corresponds to the starting phaseangle of code sinusoid in that short block.

While the foregoing describes an example code synthesizer 216 thatgenerates sine waves or data representing sine waves, other exampleimplementations of code synthesizers are possible. For example, ratherthan generating sine waves, another example code synthesizer 216 mayoutput Fourier coefficients in the frequency domain that are used toadjust amplitudes of certain frequencies of audio provided to thecombiner combiner 220. In this manner, the spectrum of the audio may beadjusted to include the requisite sine waves.

The three complex amplitude-adjusted Fourier coefficients correspondingto the symbol to be transmitted are provided from the synthesizer 216 tothe inverse Fourier transform 218, which converts the coefficients intotime-domain signals having the prescribed frequencies and amplitudes toallow their insertion into the audio to convey the desired symbols arecoupled to the combiner 220. The combiner 220 also receives the audio.In particular, the combiner 220 inserts the signals from the inverseFourier transform 218 into one long block of audio samples. As describedabove, for a given sampling rate of 48 KHz, a long block is 9216 audiosamples. In the provided example, the synchronization symbol and 49 bitsof information require a total of eight long blocks. Because each longblock is 9216 audio samples, only 73,728 samples of audio 104 are neededto encode a given message. However, because messages begin every twoseconds, which is every 96,000 audio samples, there will be many samplesat the end of the 96,000 audio samples that are not encoded. Thecombining can be done in the digital domain, or in the analog domain.

However, in the case of a pre-existing code flag, the pre-existing codeflag is inserted into the audio 104 after the last symbol representingthe previously inserted seven bits of message information. Accordingly,insertion of the pre-existing code flag information begins at sample73,729 and runs for two long blocks, or 18,432 samples. Accordingly,when pre-existing code flag information is used, fewer of the 96,000audio samples 104 will be unencoded.

The masking lineup 206 includes an overlapping short block maker thatmakes short blocks of 512 audio samples, wherein 256 of the samples areold and 256 samples are new. That is, the overlapping short block maker240 makes blocks of 512 samples, wherein 256 samples are shifted into orout of the buffer at one time. For example, when a first set of 256samples enters the buffer, the oldest 256 samples are shifted out of thebuffer. On a subsequent iteration, the first set of 256 samples areshifted to a latter position of the buffer and 256 samples are shiftedinto the buffer. Each time a new short block is made by shifting in 256new samples and removing the 256 oldest samples, the new short block isprovided to a masking evaluator 242. The 512 sample block output fromthe overlapping short block maker 240 is multiplied by a suitable windowfunction such that an “overlap-and-add” operation will restore the audiosamples to their correct value at the output. A synthesized code signalto be added to an audio signal is also similarly windowed to preventabrupt transitions at block edges when there is a change in codeamplitude from one 512-sample block to the next overlapped 512-sampleblock. These transitions if present create audible artifacts.

The masking evaluator 242 receives samples of the overlapping shortblock (e.g., 512 samples) and determines an ability of the same to hidecode frequencies to human hearing. That is, the masking evaluatordetermines if code frequencies can be hidden within the audiorepresented by the short block by evaluating each critical band of theaudio as a whole to determine its energy and determining the noise-likeor tonal-like attributes of each critical band and determining the sumtotal ability of the critical bands to mask the code frequencies.According to the illustrated example, the bandwidth of the criticalbands increases with frequency. If the masking evaluator 242 determinesthat code frequencies can be hidden in the audio 104, the maskingevaluator 204 indicates the amplitude levels at which the codefrequencies can be inserted within the audio 104, while still remaininghidden and provides the amplitude information to the synthesizer 216.

In one example, the masking evaluator 242 conducts the maskingevaluation by determining a maximum change in energy E_(b) or a maskingenergy level that can occur at any critical frequency band withoutmaking the change perceptible to a listener. The masking evaluationcarried out by the masking evaluator 242 may be carried out as outlinedin the Moving Pictures Experts Group-Advanced Audio Encoding (MPEG-AAC)audio compression standard ISO/IEC 13818-7:1997, for example. Theacoustic energy in each critical band influences the masking energy ofits neighbors and algorithms for computing the masking effect aredescribed in the standards document such as ISO/IEC 13818-7:1997. Theseanalyses may be used to determine for each short block the maskingcontribution due to tonality (e.g., how much the audio being evaluatedis like a tone) as well as noise like (i.e., how much the audio beingevaluated is like noise) features. Further analysis can evaluatetemporal masking that extends masking ability of the audio over shorttime, typically, for 50-100 milliseconds (ms). The resulting analysis bythe masking evaluator 242 provides a determination, on a per criticalband basis, the amplitude of a code frequency that can be added to theaudio 104 without producing any noticeable audio degradation (e.g.,without being audible).

Because a 256 sample block will appear in both the beginning of oneshort block and the end of the next short block and, thus, will beevaluated two times by the masking evaluator 242, the masking evaluatormakes two masking evaluations including the 256 sample block. Theamplitude indication provided to the synthesizer 216 is a composite ofthose two evaluations including that 256 sample block and the amplitudeindication is timed such that the amplitude of the code inserted intothe 256 samples is timed with those samples arriving at the combiner220.

Referring now to FIGS. 3-5, an example LUT 232 is shown that includesone column representing symbols 302 and seven columns 304, 306, 308,310, 312, 314, 316 representing numbered code frequency indices. The LUT232 includes 128 rows, which are used to represent data symbols. Becausethe LUT 232 includes 128 different data symbols, data may be sent at arate of seven bits per symbol. The frequency indices in the table mayrange from 180-656 and are based on a long block size of 9216 samplesand a sampling rate of 48 KHz. Accordingly, the frequenciescorresponding to these indices range between 937.5 Hz and 3126.6 Hz,which falls into the humanly audible range. Of course, other samplingrates and frequency indices may be selected. A description of a processto generate a LUT, such as the table 232 is provided in conjunction withFIGS. 7-9.

In one example operation of the code frequency selector 214, a symbol of25 (e.g., a binary value of 0011001) is received from the symbolselector 212. The code frequency selector 214 accesses the LUT 232 andreads row 25 of the symbol column 302. From this row, the code frequencyselector reads that code frequency indices 217, 288, 325, 403, 512, 548,and 655 are to be emphasized in the audio 104 to communicate the symbol25 to the decoder. The code frequency selector 214 then provides anindication of these indices to the synthesizer 216, which synthesizesthe code signals by outputting Fourier coefficients corresponding tothese indices.

The combiner 220 receives both the output of the code synthesizer 216and the audio 104 and combines them to form encoded audio. The combiner220 may combine the output of the code synthesizer 216 and the audio 104in an analog or digital form. If the combiner 220 performs a digitalcombination, the output of the code synthesizer 216 may be combined withthe output of a sampler, rather than the audio that is input to thesampler. For example, the audio block in digital form may be combinedwith the sine waves in digital form. Alternatively, the combination maybe carried out in the frequency domain, wherein frequency coefficientsof the audio are adjusted in accordance with frequency coefficientsrepresenting the sine waves. As a further alternative, the sine wavesand the audio may be combined in analog form. The encoded audio may beoutput from the combiner 220 in analog or digital form. If the output ofthe combiner 220 is digital, it may be subsequently converted to analogform before being coupled to the transmitter 106.

An example encoding process 600 is shown in FIG. 6. The example process600 may be carried out by the example encoder 102 of FIG. 2, or by anyother suitable encoder. The example process 600 begins when audiosamples to be encoded are received (block 602). The process 600 thendetermines if the received samples have been previously encoded (block604). This determination may be carried out, for example, by the priorcode detector 204 of FIG. 2, or by any suitable decoder configured toexamine the audio to be encoded for evidence of a prior encoding.

If the received samples have not been previously encoded (block 604),the process 600 generates a communication message (block 606), such as acommunication message having the format shown in FIG. 2 at referencenumeral 222. In one particular example, when the audio has not beenpreviously encoded, the communication message may include asynchronization portion and one or more portions including data bits.The communication message generation may be carried out, for example, bythe message generator 210 of FIG. 2.

The communication message is then mapped into symbols (block 608). Forexample, the synchronization information need not be mapped into asymbol if the synchronization information is already a symbol. Inanother example, if a portion of the communication message is a seriesof bits, such bits or groups of bits may be represented by one symbol.As described above in conjunction with the symbol selector 212, which isone manner in which the mapping (block 608) may be carried out, one ormore tables or encoding schemes may be used to convert bits intosymbols. For example, some techniques may include the use of errorcorrection coding, or the like, to increase message robustness throughthe use of coding gain. In one particular example implementation havinga symbol space sized to accommodate 128 data symbols, seven bits may beconverted into one symbol. Of course, other numbers of bits may beprocessed depending on many factors including available symbol space,error correction encoding, etc.

After the communication symbols have been selected (block 608), theprocess 600 selects a LUT that will be used to determine the codefrequencies that will be used to represent each symbol (block 610). Inone example, the selected LUT may be the example LUT 232 of FIGS. 3-5,or may be any other suitable LUT. Additionally, the LUT may be any LUTgenerated as described in conjunction with FIGS. 7-9. The selection ofthe LUT may be based on a number of factors including thesynchronization symbol that is selected during the generation of thecommunication message (block 606).

After the symbols have been generated (block 608) and the LUT isselected (block 610), the symbols are mapped into code frequencies usingthe selected LUT (block 612). In one example in which the LUT 232 ofFIG. 3-5 is selected, a symbol of, for example, 35 would be mapped tothe frequency indices 218, 245, 360, 438, 476, 541, and 651. The dataspace in the LUT is between symbol 0 and symbol 127 and symbol 128,which uses a unique set of code frequencies that do not match any othercode frequencies in the table, is used to indicate a synchronizationsymbol. The LUT selection (block 610) and the mapping (block 612) may becarried out by, for example, the code frequency selector 214 of FIG. 2.After the code frequencies are selected, an indication of the same isprovided to, for example, the synthesizer 216 of FIG. 2.

Code signals including the code frequencies are then synthesized (block614) at amplitudes according to a masking evaluation, which is describedin conjunction with blocks 240 and 242 or FIG. 2, and is described inconjunction with the process 600 below. In one example, the synthesis ofthe code frequency signals may be carried out by providing appropriatelyscaled Fourier coefficients to an inverse Fourier process. In oneparticular example, three Fourier coefficients may be output torepresent each code frequency in the code frequency signals.Accordingly, the code frequencies may be synthesized by the inverseFourier process in a manner in which the synthesized frequencies arewindowed to prevent spill over into other portions of the signal intowhich the code frequency signals are being embedded. One exampleconfiguration that may be used to carry out the synthesis of block 614is shown at blocks 216 and 218 of FIG. 2. Of course otherimplementations and configurations are possible.

After the code signals including the code frequencies have beensynthesized, they are combined with the audio samples (block 616). Asdescribed in conjunction with FIG. 2, the combination of the codesignals and the audio is such that one symbol is inserted into each longblock of audio samples. Accordingly, to communicate one synchronizationsymbol and 49 data bits, information is encoded into eight long blocksof audio information: one long block for the synchronization symbol andone long block for each seven bits of data (assuming seven bits/symbolencoding). The messages are inserted into the audio at two secondintervals. Thus, the eight long blocks of audio immediately followingthe start of a message may be encoded with audio and the remaining longblocks that make up the balance of the two second of audio may beunencoded.

The insertion of the code signal into the audio may be carried out byadding samples of the code signal to samples of the host audio signal,wherein such addition is done in the analog domain or in the digitaldomain. Alternatively, with proper frequency alignment and registration,frequency components of the audio signal may be adjusted in thefrequency domain and the adjusted spectrum converted back into the timedomain.

The foregoing described the operation of the process 600 when theprocess determined that the received audio samples have not beenpreviously encoded (block 604). However, in situations in which aportion of media has been through a distribution chain and encoded as itwas processed, the received samples of audio processed at block 604already include codes. For example, a local television station using acourtesy news clip from CNN in a local news broadcast might not getviewing credit based on the prior encoding of the CNN clip. As such,additional information is added to the local news broadcast in the formof pre-existing code flag information. If the received samples of audiohave been previously encoded (block 604), the process generatespre-existing code flag information (block 618). The pre-existing codeflag information may include the generation of an pre-existing code flagsynchronization symbol and, for example, the generation of seven bits ofdata, which will be represented by a single data symbol. The data symbolmay represent a station identification, a time, or any other suitableinformation. For example, a media monitoring site (MMS) may beprogrammed to detect the pre-existing code flag information to creditthe station identified therein.

After the pre-existing code flag information has been generated (block618), the process 600 selects the pre-existing code flag LUT that willbe used to identify code frequencies representative of the pre-existingcode flag information (block 620). In one example, the pre-existing codeflag LUT may be different than other LUTs used in non-pre-existing codeconditions. In one particular example, the pre-existing code flagsynchronization symbol may be represented by the code frequencies 220,292, 364, 436, 508, 580, and 652.

After the pre-existing code flag information is generated (block 618)and the pre-existing code flag LUT is selected (block 620), thepre-existing code flag symbols are mapped to code frequencies (block612), and the remainder of the processing follows as previouslydescribed.

Sometime before the code signal is synthesized (block 614), the process600 conducts a masking evaluation to determine the amplitude at whichthe code signal should be generated so that it still remains inaudibleor substantially inaudible to human hearers. Accordingly, the process600 generates overlapping short blocks of audio samples, each containing512 audio samples (block 622). As described above, the overlapping shortblocks include 50% old samples and 50% newly received samples. Thisoperation may be carried out by, for example, the overlapping shortblock maker 240 of FIG. 2.

After the overlapping short blocks are generated (block 622), maskingevaluations are performed on the short blocks (block 624). For example,this may be carried out as described in conjunction with block 242 ofFIG. 2. The results of the masking evaluation are used by the process600 at block 614 to determine the amplitude of the code signal to besynthesized. The overlapping short block methodology may yield twomasking evaluation for a particular 256 samples of audio (one when the256 samples are the “new samples,” and one when the 256 samples are the“old samples”), the result provided to block 614 of the process 600 maybe a composite of these masking evaluations. Of course, the timing ofthe process 600 is such that the masking evaluations for a particularblock of audio are used to determine code amplitudes for that block ofaudio.

Lookup Table Generation

A system 700 for populating one or more LUTs with code frequenciescorresponding to symbols may be implemented using hardware, software,combinations of hardware and software, firmware, or the like. The system700 of FIG. 7 may be used to generate any number of LUTs, such as theLUT of FIGS. 3-5. The system 700 which operates as described below inconjunction with FIG. 7 and FIG. 8, results in a code frequency indexLUT, wherein: (1) two symbols of the table are represented by no morethan one common frequency index, (2) not more than one of the frequencyindices representing a symbol reside in one audio critical band asdefined by the MPEG-AA compression standard ISO/IEC 13818-7:1997, and(3) code frequencies of neighboring critical bands are not used torepresent a single symbol. Criteria number 3 helps to ensure that audioquality is not compromised during the audio encoding process.

A critical band pair definer 702 defines a number (P) of critical bandpairs. For example, referring to FIG. 9, a table 900 includes columnsrepresenting AAC critical band indices 902, short block indices 904 inthe range of the AAC indices, and long block indices 906 in the range ofthe AAC indices. In one example, the value of P may be seven and, thus,seven critical band pairs are formed from the AAC indices (block 802).FIG. 10 shows the frequency relationship between the AAC indices.According to one example, as shown at reference numeral 1002 in FIG. 10wherein frequencies of critical band pairs are shown as separated bydotted lines, AAC indices may be selected into pairs as follows: fiveand six, seven and eight, nine and ten, eleven and twelve, thirteen andfourteen, fifteen and sixteen, and seventeen and seventeen. The AACindex of seventeen includes a wide range of frequencies and, therefore,index 17 is shown twice, once for the low portion and once for the highportion.

A frequency definer 704 defines a number of frequencies (N) that areselected for use in each critical band pair. In one example, the valueof N is sixteen, meaning that there are sixteen data positions in thecombination of the critical bands that form each critical band pair.Reference numeral 1004 in FIG. 10 identifies the seventeen frequencypositions are shown. The circled position four is reserved forsynchronization information and, therefore, is not used for data.

A number generator 706 defines a number of frequency positions in thecritical band pairs defined by the critical band pair definer 702. Inone example the number generator 706 generates all N^(P), P-digitnumbers. For example, if N is 16 and P is 7, the process generates thenumbers 0 through 268435456, but may do so in base 16-hexadecimal, whichwould result in the values 0 through 10000000.

A redundancy reducer 708 then eliminates all number from the generatedlist of numbers sharing more than one common digit between them in thesame position. This ensures compliance with criteria (1) above because,as described below, the digits will be representative of the frequenciesselected to represent symbols. An excess reducer 710 may then furtherreduce the remaining numbers from the generated list of numbers to thenumber of needed symbols. For example, if the symbol space is 129symbols, the remaining numbers are reduced to a count of 129. Thereduction may be carried out at random, or by selecting remainingnumbers with the greatest Euclidean distance, or my any other suitabledata reduction technique. In another example, the reduction may becarried out in a pseudorandom manner.

After the foregoing reductions, the count of the list of numbers isequal to the number of symbols in the symbol space. Accordingly, a codefrequency definer 712 defines the remaining numbers in base P format torepresent frequency indices representative of symbols in the criticalband pairs. For example, referring to FIG. 10, the hexadecimal numberF1E4B0F is in base 16, which matches P. The first digit of thehexadecimal number maps to a frequency component in the first criticalband pair, the second digit to the second critical band pair, and so on.Each digit represents the frequency index that will be used to representthe symbol corresponding to the hexadecimal number F1E4B0F.

Using the first hexadecimal number as an example of mapping to aparticular frequency index, the decimal value of Fh is 15. Becauseposition four of each critical band pair is reserved for non-datainformation, the value of any hexadecimal digit greater than four isincremented by the value of one decimal. Thus, the 15 becomes a 16. The16 is thus designated (as shown with the asterisk in FIG. 10) as beingthe code frequency component in the first critical band pair torepresent the symbol corresponding to the hexadecimal number F1E4B0F.Though not shown in FIG. 10, the index 1 position (e.g., the secondposition from the far left in the critical band 7 would be used torepresent the hexadecimal number F1E4B0F.

A LUT filler 714 receives the symbol indications and corresponding codefrequency component indications from the code frequency definer 712 andfills this information into a LUT.

An example code frequency index table generation process 800 is shown inFIG. 8. The process 800 may be implemented using the system of FIG. 7,or any other suitable configuration. The process 800 of FIG. 8 may beused to generate any number of LUTs, such as the LUT of FIGS. 3-5. Whileone example process 800 is shown, other processes may be used. Theresult of the process 800 is a code frequency index LUT, wherein: (1)two symbols of the table are represented by no more than one commonfrequency index, (2) not more than one of the frequency indicesrepresenting a symbol reside in one audio critical band as defined bythe MPEG-AA compression standard ISO/IEC 13818-7:1997, and (3) codefrequencies of neighboring critical bands are not used to represent asingle symbol. Criteria number 3 helps to ensure that audio quality isnot compromised during the audio encoding process.

The process 800 begins by defining a number (P) of critical band pairs.For example, referring to FIG. 9, a table 900 includes columnsrepresenting AAC critical band indices 902, short block indices 904 inthe range of the AAC indices, and long block indices 906 in the range ofthe AAC indices. In one example, the value of P may be seven and, thus,seven critical band pairs are formed from the AAC indices (block 802).FIG. 10 shows the frequency relationship between the AAC indices.According to one example, as shown at reference numeral 1002 in FIG. 10wherein frequencies of critical band pairs are shown as separated bydotted lines, AAC indices may be selected into pairs as follows: fiveand six, seven and eight, nine and ten, eleven and twelve, thirteen andfourteen, fifteen and sixteen, and seventeen and seventeen. The AACindex of seventeen includes a wide range of frequencies and, therefore,index 17 is shown twice, once for the low portion and once for the highportion.

After the band pairs have been defined (block 802), a number offrequencies (N) is selected for use in each critical band pair (block804). In one example, the value of N is sixteen, meaning that there aresixteen data positions in the combination of the critical bands thatform each critical band pair. As shown in FIG. 10 as reference numeral1004, the seventeen frequency positions are shown. The circled positionfour is reserved for synchronization information and, therefore, is notused for data.

After the number of critical band pairs and the number of frequencypositions in the pairs is defined, the process 800 generates all N^(P),P-digit numbers with no more than one hexadecimal digit in common (block806). For example, if N is 16 and P is 7, the process generates thenumbers 0 through 268435456, but may do so in base 16-hexadecimal, whichwould results in 0 through FFFFFFF, but does not include the numbersthat share more than one common hexadecimal digit. This ensurescompliance with criteria (1) above because, as described below, thedigits will be representative of the frequencies selected to representsymbols.

According to an example process for determining a set of numbers thatcomply with criteria (1) above (and any other desired criteria), thenumbers in the range from 0 to N^(P)−1 are tested. First, the valuecorresponding to zero is stored as the first member of the result set R.Then, the numbers from 1 to N^(P)−1 are selected for analysis todetermine if they meet criteria (1) when compared to the members of R.Each number that meets criteria (1) when compared against all thecurrent entries in R is added to the result set. In particular,according to the example process, in order to test a number K, eachhexadecimal digit of interest in K is compared to the correspondinghexadecimal digit of interest in an entry M from the current result set.In the 7 comparisons not more than one hexadecimal digit of K shouldequal the corresponding hexadecimal digit of M. If, after comparing Kagainst all numbers currently in the result set, no member of the latterhas more than one common hexadecimal digit, then K is added to theresult set R. The algorithm iterates through the set of possible numbersuntil all values meeting criteria (1) have been identified.

While the foregoing describes an example process for determining a setof numbers that meets criteria (1), any process or algorithm may be usedand this disclosure is not limited to the process described above. Forexample, a process may use heuristics, rules, etc. to eliminate numbersfrom the set of numbers before iterating throughout the set. Forexample, all of the numbers where the relevant bits start with two 0's,two 1's two 2's, etc. and end with two 0's, two 1's two 2's, etc. couldimmediately be removed because they will definitely have a hammingdistance less than 6. Additionally or alternatively, an example processmay not iterate through the entire set of possible numbers. For example,a process could iterate until enough numbers are found (e.g., 128numbers when 128 symbols are desired). In another implementation, theprocess may randomly select a first value for inclusion in the set ofpossible values and then may search iteratively or randomly through theremaining set of numbers until a value that meets the desired criteria(e.g., criteria (1)) is found.

The process 800 then selects the desired numbers from the generatedvalues (block 810). For example, if the symbol space is 129 symbols, theremaining numbers are reduced to a count of 129. The reduction may becarried out at random, or by selecting remaining numbers with thegreatest Euclidean distance, or my any other suitable data reductiontechnique.

After the foregoing reductions, the count of the list of numbers isequal to the number of symbols in the symbol space. Accordingly, theremaining numbers in base P format are defined to represent frequencyindices representative of symbols in the critical band pairs (block812). For example, referring to FIG. 10, the hexadecimal number F1E4B0Fis in base 16, which matches P. The first digit of the hexadecimalnumber maps to a frequency component in the first critical band pair,the second digit to the second critical band pair, and so on. Each digitrepresents the frequency index that will be used to represent the symbolcorresponding to the hexadecimal number F1E4B0F.

Using the first hexadecimal number as an example of mapping to aparticular frequency index, the decimal value of Fh is 15. Becauseposition four of each critical band pair is reserved for non-datainformation, the value of any hexadecimal digit greater than four isincremented by the value of one decimal. Thus, the 15 becomes a 16. The16 is thus designated (as shown with the asterisk in FIG. 10) as beingthe code frequency component in the first critical band pair torepresent the symbol corresponding to the hexadecimal number F1E4B0F.Though not shown in FIG. 10, the index 1 position (e.g., the secondposition from the far left in the critical band 7 would be used torepresent the hexadecimal number F1E4B0F.

After assigning the representative code frequencies (block 812), thenumbers are filled into a LUT (block 814).

Of course, the systems and processes described in conjunction with FIGS.8-10 are only examples that may be used to generate LUTs having desiredproperties in conjunction the encoding and decoding systems describedherein. Other configurations and processes may be used.

Audio Decoding

In general, the decoder 116 detects a code signal that was inserted intoreceived audio to form encoded audio at the encoder 102. That is, thedecoder 116 looks for a pattern of emphasis in code frequencies itprocesses. Once the decoder 116 has determined which of the codefrequencies have been emphasized, the decoder 116 determines, based onthe emphasized code frequencies, the symbol present within the encodedaudio. The decoder 116 may record the symbols, or may decode thosesymbols into the codes that were provided to the encoder 102 forinsertion into the audio.

In one implementation, the example decoder 116 of FIG. 11 may beimplemented using, for example, a digital signal processor programmedwith instructions to implement components of the decoder 116. Of course,any other implementation of the example decoder 116 is possible. Forexample, the decoder 116 may be implemented using one or moreprocessors, programmable logic devices, or any suitable combination ofhardware, software, and firmware.

As shown in FIG. 11, an example decoder 116 includes a sampler 1102,which may be implemented using an analog to digital converter (A/D) orany other suitable technology, to which encoded audio is provided inanalog format. As shown in FIG. 1, the encoded audio may be provided bya wired or wireless connection to the receiver 110. The sampler 1102samples the encoded audio at, for example, a sampling frequency of 8KHz. Of course, other sampling frequencies may be advantageouslyselected in order to increase resolution or reduce the computationalload at the time of decoding. At a sampling frequency of 8 KHz, theNyquist frequency is 4 KHz and, therefore, all of the embedded codesignal is preserved because its spectral frequencies are lower than theNyquist frequency. The 9216-sample FFT long block length at 48 KHzsampling rate is reduced to 1536 samples at 8 KHz sampling rate. Howevereven at this modified DFT block size, the code frequency indices areidentical to the original encoding frequencies and range from 180 to656.

The samples from the sampler 1102 are provided to a stacker 1104. Ingeneral, the stacker 1104 accentuates the code signal in the audiosignal information by taking advantage of the fact that messages arerepeated or substantially repeated (e.g., only the least significantbits are changed) for a period of time. For example, 42 bits (226 ofFIG. 2) of the 49 bits (226 and 224) of the previously described examplemessage of FIG. 2 remain constant for 64 seconds (32 2-second messageintervals) when the 42 bits of data 226 in the message include a stationidentifier and a coarse time stamp which increments once every 64seconds. The variable data in the last 7 bit group 232 represents timeincrements in seconds and, thus, varies from message to message. Theexample stacker 1104 aggregates multiple blocks of audio signalinformation to accentuate the code signal in the audio signalinformation. In an example implementation, the stacker 1104 comprises abuffer to store multiple samples of audio information. For example, if acomplete message is embedded in two seconds of audio, the buffer may betwelve seconds long to store six messages. The example stacker 1104additionally comprises an adder to sum the audio signal informationassociated with the six messages and a divider to divide the sum by thenumber of repeated messages selected (e.g., six).

By way of example, a watermarked signal y(t) can be represented by thesum of the host signal x(t) and watermark w(t):

y(t)=x(t)+w(t)

In the time domain, watermarks may repeat after a known period T:

w(t)=w(t−T)

According to an example stacking method, the input signal y(t) isreplaced by a stacked signal S(t):

${S(t)} = \frac{{y(t)} + {y\left( {t - T} \right)} + \ldots + {y\left( {t - {\left( {n - 1} \right)T}} \right)}}{n}$

In the stacked signal S(t), the contribution of the host signaldecreases because the values of samples x(t), x(t−T), . . . , x(t−nT)are independent if the period T is sufficiently large. At the same time,the contribution of the watermarks being made of, for example, in-phasesinusoids, is enhanced.

${S(t)} = {\frac{{x(t)} + {x\left( {t - T} \right)} + \ldots + {x\left( {t - {\left( {n - 1} \right)T}} \right)}}{n} + {w(t)}}$

Assuming x(t), x(t−T), . . . , x(t−nT) are independent random variablesdrawn from the same distribution X with zero mean E[X]=0 we obtain:

${{\lim\limits_{n->\infty}{E\left\lbrack \frac{{x(t)} + {x\left( {t - T} \right)} + \ldots + {x\left( {t - {\left( {n - 1} \right)T}} \right)}}{n} \right\rbrack}}->0},{and}$${{Var}\left\lbrack \frac{{x(t)} + {x\left( {t - T} \right)} + \ldots + {x\left( {t - {\left( {n - 1} \right)T}} \right)}}{n} \right\rbrack} = \frac{{Var}(X)}{n}$

Accordingly, the underlying host signal contributions x(t), . . . ,x(t−nT) will effectively be canceling each other while the watermark isunchanged allowing the watermark to be more easily detected.

In the illustrated example, the power of the resulting signal decreaseslinearly with the number of stacked signals n. Therefore, averaging overindependent portions of the host signal can reduce the effects ofinterference. The watermark is not affected because it will always beadded in-phase.

An example process for implementing the stacker 1104 is described inconjunction with FIG. 12.

The decoder 116 may additionally include a stacker controller 1106 tocontrol the operation of the stacker 1104. The example stackercontroller 1106 receives a signal indicating whether the stacker 1104should be enabled or disabled. For example, the stacker controller 1106may receive the received audio signal and may determine if the signalincludes significant noise that will distort the signal and, in responseto the determination, cause the stacker to be enabled. In anotherimplementation, the stacker controller 1106 may receive a signal from aswitch that can be manually controlled to enable or disable the stacker1104 based on the placement of the decoder 116. For example, when thedecoder 116 is wired to the receiver 110 or the microphone 120 is placedin close proximity to the speaker 114, the stacker controller 1106 maydisable the stacker 1104 because stacking will not be needed and willcause corruption of rapidly changing data in each message (e.g., theleast significant bits of a timestamp). Alternatively, when the decoder116 is located at a distance from the speaker 114 or in anotherenvironment where significant interference may be expected, the stacker1104 may be enabled by the stacker controller 1106. Of course, any typeof desired control may be applied by the stacker controller 1106.

The output of the stacker 1104 is provided to a time to frequency domainconverter 1108. The time to frequency domain converter 1108 may beimplemented using a discrete Fourier transformation (DFT), or any othersuitable technique to convert time-based information intofrequency-based information. In one example, the time to frequencydomain converter 1108 may be implemented using a sliding long block fastFourier transform (FFT) in which a spectrum of the code frequencies ofinterest is calculated each time eight new samples are provided to theexample time to time to frequency domain converter 1108. In one example,the time to frequency domain converter 1108 uses 1,536 samples of theencoded audio and determines a spectrum therefrom using 192 slides ofeight samples each. The resolution of the spectrum produced by the timeto frequency domain converter 1108 increases as the number of samplesused to generate the spectrum is increased. Thus, the number of samplesprocessed by the time to frequency domain converter 1108 should matchthe resolution used to select the indices in the tables of FIGS. 3-5.

The spectrum produced by the time to frequency domain converter 1108passes to a critical band normalizer 1110, which normalizes the spectrumin each of the critical bands. In other words, the frequency with thegreatest amplitude in each critical band is set to one and all otherfrequencies within each of the critical bands are normalizedaccordingly. For example, if critical band one includes frequencieshaving amplitudes of 112, 56, 56, 56, 56, 56, and 56, the critical bandnormalizer would adjust the frequencies to be 1, 0.5, 0.5, 0.5, 0.5,0.5, and 0.5. Of course, any desired maximum value may be used in placeof one for the normalization. The critical band normalizer 1110 outputsthe normalized score for each of the frequencies of the interest.

The spectrum of scores produced by the critical band normalizer 1110 ispassed to the symbol scorer 1112, which calculates a total score foreach of the possible symbols in the active symbol table. In an exampleimplementation, the symbol scorer 1112 iterates through each symbol inthe symbol table and sums the normalized score from the critical bandnormalizer 1110 for each of the frequencies of interest for theparticular symbol to generate a score for the particular symbol. Thesymbol scorer 1112 outputs a score for each of the symbols to the maxscore selector 1114, which selects the symbol with the greatest scoreand outputs the symbol and the score.

The identified symbol and score from the max score selector 1114 arepassed to the comparator 1116, which compares the score to a threshold.When the score exceeds the threshold, the comparator 1116 outputs thereceived symbol. When the score does not exceed the threshold, thecomparator 1116 outputs an error indication. For example, the comparator1116 may output a symbol indicating an error (e.g., a symbol notincluded in the active symbol table) when the score does not exceed thethreshold. Accordingly, when a message has been corrupted such that agreat enough score (i.e., a score that does not exceed the threshold) isnot calculated for a symbol, an error indication is provided. In anexample implementation, error indications may be provided to the stackercontroller 1106 to cause the stacker 1104 to be enabled when a thresholdnumber of errors are identified (e.g., number of errors over a period oftime, number of consecutive errors, etc.).

The identified symbol or error from the comparator 1116 is passed to thecircular buffers 1118 and the pre-existing code flag circular buffers1120. An example implementation of the standard buffers 1118 isdescribed in conjunction with FIG. 15. The example circular buffers 1118comprise one circular buffer for each slide of the time domain tofrequency domain converter 1108 (e.g., 192 buffers). Each circularbuffer of the circular buffers 1118 includes one storage location forthe synchronize symbol and each of the symbol blocks in a message (e.g.,eight block messages would be stored in eight location circular buffers)so that an entire message can be stored in each circular buffer.Accordingly, as the audio samples are processed by the time domain tofrequency domain converter 1108, the identified symbols are stored inthe same location of each circular buffer until that location in eachcircular buffer has been filled. Then, symbols are stored in the nextlocation in each circular buffer. In addition to storing symbols, thecircular buffers 1118 may additionally include a location in eachcircular buffer to store a sample index indicating the sample in theaudio signal that was received that resulted in the identified symbol.

The example pre-existing code flag circular buffers 1120 are implementedin the same manner as the circular buffers 1118, except the pre-existingcode flag circular buffers 1120 include one location for thepre-existing code flag synchronize symbol and one location for eachsymbols in the pre-existing code flag message (e.g., an pre-existingcode flag synchronize that includes one message symbol would be storedin two location circular buffers). The pre-existing code flag circularbuffers 1120 are populated at the same time and in the same manner asthe circular buffers 1118.

The example message identifier 1122 analyzes the circular buffers 1118and the pre-existing code flag circular buffers 1120 for a synchronizesymbol. For example, the message identifier 1122 searches for asynchronize symbol in the circular buffers 1118 and an pre-existing codeflag synchronize symbol in the pre-existing code flag circular buffers1120. When a synchronize symbol is identified, the symbols following thesynchronize symbol (e.g., seven symbols after a synchronize symbol inthe circular buffers 1118 or one symbol after an pre-existing code flagsynchronize symbol in the pre-existing code flag circular buffers 1120)are output by the message identifier 1122. In addition, the sample indexidentifying the last audio signal sample processed is output.

The message symbols and the sample index output by the messageidentifier 1122 are passed to the validator 1124, which validates eachmessage. The validator 1124 includes a filter stack that stores severalconsecutively received messages. Because messages are repeated (e.g.,every 2 seconds or 16,000 samples at 8 KHz), each message is comparedwith other messages in the filter stack that are separated byapproximately the number of audio samples in a single message todetermine if a match exists. If a match or substantial match exists,both messages are validated. If a message cannot be identified, it isdetermined that the message is an error and is not emitted from thevalidator 1124. In cases where messages might be affected by noiseinterference, messages might be considered a match when a subset ofsymbols in a message match the same subset in another already validatedmessage. For example, if four of seven symbols in a message match thesame four symbols in another message that has already been validated,the message can be identified as partially validated. Then, a sequenceof the repeated messages can be observed to identify the non-matchingsymbols in the partially validated message.

The validated messages from the validator 1124 are passed to the symbolto bit converter 1126, which translates each symbol to the correspondingdata bits of the message using the active symbol table.

An example decoding process 1200 is shown in FIG. 12. The exampleprocess 1200 may be carried out by the example decoder 116 shown in FIG.11, or by any other suitable decoder. The example process 1200 begins bysampling audio (block 1202). The audio may be obtained via an audiosensor, a hardwired connection, via an audio file, or through any othersuitable technique. As explained above the sampling may be carried outat 8,000 Hz, or any other suitable frequency.

As each sample is obtained, the sample is aggregated by a stacker suchas the example stacker 1104 of FIG. 11 (block 1204). An example processfor performing the stacking is described in conjunction with FIG. 13.

The new stacked audio samples from the stacker process 1204 are insertedinto a buffer and the oldest audio samples are removed (block 1206). Aseach sample is obtained, a sliding time to frequency conversion isperformed on a collection of samples including numerous older samplesand the newly added sample obtained at blocks 1202 and 1204 (block1208). In one example, a sliding FFT may be used to process streaminginput samples including 9215 old samples and the one newly added sample.In one example, the FFT using 9216 samples results in a spectrum havinga resolution of 5.2 Hz.

After the spectrum is obtained through the time to frequency conversion(block 1208), the transmitted symbol is determined (block 1210). Anexample process for determining the transmitted symbol is described inconjunction with FIG. 14.

After the transmitted message is identified (block 1210), buffer postprocessing is performed to identify a synchronize symbol andcorresponding message symbols (block 1212). An example process forperforming post-processing is described in conjunction with FIG. 15.

After post processing is performed to identify a transmitted message(block 1212), message validation is performed to verify the validity ofthe message (block 1214). An example process for performing the messagevalidation is described in conjunction with FIG. 18.

After a message has been validated (block 1214), the message isconverted from symbols to bits using the active symbol table (block1216). Control then returns to block 1106 to process the next set ofsamples.

FIG. 13 illustrates an example process for stacking audio signal samplesto accentuate an encoded code signal to implement the stack audioprocess 1204 of FIG. 12. The example process may be carried out by thestacker 1104 and the stacker controller 1106 of FIG. 11. The exampleprocess begins by determining if the stacker control is enabled (block1302). When the stacker control is not enabled, no stacking is to occurand the process of FIG. 13 ends and control returns to block 1206 ofFIG. 12 to process the audio signal samples unstacked.

When the stacker control is enabled, newly received audio signal samplesare pushed into a buffer and the oldest samples are pushed out (block1304). The buffer stores a plurality of samples. For example, when aparticular message is repeatedly encoded in an audio signal every twoseconds and the encoded audio is sampled at 8 KHz, each message willrepeat every 16,000 samples so that buffer will store some multiple of16,000 samples (e.g., the buffer may store six messages with a 96,000sample buffer). Then, the stacker 1108 selects substantially equalblocks of samples in the buffer (block 1306). The substantially equalblocks of samples are then summed (block 1308). For example, sample oneis added to samples 16,001, 32,001, 48,001, 64,001, and 80,001, sampletwo is added to samples 16,002, 32,002, 48,002, 64,002, 80,002, sample16,000 is added to samples 32,000, 48,000, 64,000, 80,000, and 96,000.

After the audio signal samples in the buffer are added, the resultingsequence is divided by the number of blocks selected (e.g., six blocks)to calculate an average sequence of samples (e.g., 16,000 averagedsamples) (block 1310). The resulting average sequence of samples isoutput by the stacker (block 1312). The process of FIG. 13 then ends andcontrol returns to block 1206 of FIG. 12.

FIG. 14 illustrates an example process for implementing the symboldetermination process 1210 after the received audio signal has beenconverted to the frequency domain. The example process of FIG. 14 may beperformed by the decoder 116 of FIGS. 1 and 11. The example process ofFIG. 14 begins by normalizing the code frequencies in each of thecritical bands (block 1402). For example, the code frequencies may benormalized so that the frequency with the greatest amplitude is set toone and all other frequencies in that critical band are adjustedaccordingly. In the example decoder 116 of FIG. 11, the normalization isperformed by the critical band normalizer 1110.

After the frequencies of interest have been normalized (block 1402). Theexample symbol scorer 1112 selects the appropriate symbol table based onthe previously determined synchronization table (block 1404). Forexample, a system may include two symbol tables: one table for a normalsynchronization and one table for an pre-existing code flagsynchronization. Alternatively, the system may include a single symboltable or may include multiple synchronization tables that may beidentified by synchronization symbols (e.g., cross-table synchronizationsymbols). The symbol scorer 1112 then computes a symbol score for eachsymbol in the selected symbol table (block 1406). For example, thesymbol scorer 1112 may iterate across each symbol in the symbol tableand add the normalized scores for each of the frequencies of interestfor the symbol to compute a symbol score.

After each symbol is scored (block 1406), the example max score selector1114 selects the symbol with the greatest score (block 1408). Theexample comparator 1116 then determines if the score for the selectedsymbol exceeds a maximum score threshold (block 1410). When the scoredoes not exceed the maximum score threshold, an error indication isstored in the circular buffers (e.g., the circular buffers 1118 and thepre-existing code flag circular buffers 1120) (block 1412). The processof FIG. 14 then completes and control returns to block 1212 of FIG. 12.

When the score exceeds the maximum score threshold (block 1410), theidentified symbol is stored in the circular buffers (e.g., the circularbuffers 1118 and the pre-existing code flag circular buffers 1120)(block 1414). The process of FIG. 14 then completes and control returnsto block 1212 of FIG. 12.

FIG. 15 illustrates an example process for implementing the buffer postprocessing 1212 of FIG. 12. The example process of FIG. 15 begins whenthe message identifier 1122 of FIG. 11 searches the circular buffers1118 and the circular buffers 1120 for a synchronization indication(block 1502).

For example, FIG. 16 illustrates an example implementation of circularbuffers 1118 and FIG. 17 illustrates an example implementation ofpre-existing code flag circular buffers 1120. In the illustrated exampleof FIG. 16, the last location in the circular buffers to have beenfilled is location three as noted by the arrow. Accordingly, the sampleindex indicates the location in the audio signal samples that resultedin the symbols stored in location three. Because the line correspondingto sliding index 37 is a circular buffer, the consecutively identifiedsymbols are 128, 57, 22, 111, 37, 23, 47, and 0. Because 128 in theillustrated example is a synchronize symbol, the message can beidentified as the symbols following the synchronize symbol. The messageidentifier 1122 would wait until 7 symbols have been located followingthe identification of the synchronization symbol at sliding index 37.

The pre-existing code flag circular buffers 1120 of FIG. 17 include twolocations for each circular buffer because the pre-existing code flagmessage of the illustrated example comprises one pre-existing code flagsynchronize symbol (e.g., symbol 254) followed by a single messagesymbol. According to the illustrated example of FIG. 2, the pre-existingcode flag data block 230 is embedded in two long blocks immediatelyfollowing the 7 bit timestamp long block 228. Accordingly, because thereare two long blocks for the pre-existing code flag data and each longblock of the illustrated example is 1,536 samples at a sampling rate of8 KHz, the pre-existing code flag data symbol will be identified in thepre-existing code flag circular buffers 3072 samples after the originalmessage. In the illustrated example FIG. 17, sliding index 37corresponds to sample index 38744, which is 3072 samples later thansliding index 37 of FIG. 16 (sample index 35672). Accordingly, thepre-existing code flag data symbol 68 can be determined to correspond tothe message in sliding index 37 of FIG. 16, indicating that the messagein sliding index 37 of FIG. 16 identifies an original encoded message(e.g., identifies an original broadcaster of audio) and the slidingindex 37 identifies an pre-existing code flag message (e.g., identifiesa re-broadcaster of audio).

Returning to FIG. 12, after a synchronize or pre-existing code flagsynchronize symbol is detected, messages in the circular buffers 1118 orthe pre-existing code flag circular buffers 1120 are condensed toeliminate redundancy in the messages. For example, as illustrated inFIG. 16, due to the sliding time domain to frequency domain conversionand duration of encoding for each message, messages are identified inaudio data for a period of time (sliding indexes 37-39 contain the samemessage). The identical messages in consecutive sliding indexes can becondensed into a single message because they are representative of onlyone encoded message. Alternatively, condensing may be eliminated and allmessages may be output when desired. The message identifier 1122 thenstores the condensed messages in a filter stack associated with thevalidator 1124 (block 1506). The process of FIG. 15 then ends andcontrol returns to block 1214 of FIG. 12.

FIG. 18 illustrates an example process to implement the messagevalidation process 1214 of FIG. 12. The example process of FIG. 12 maybe performed by the validator 1124 of FIG. 11. The example process ofFIG. 18 begins when the validator 1124 reads the top message in thefilter stack (block 1802).

For example, FIG. 19 illustrates an example implementation of a filterstack. The example filter stack includes a message index, seven symbollocations for each message index, a sample index identification, and avalidation flag for each message index. Each message is added at messageindex M7 and a message at location M0 is the top message that is read inblock 1802 of FIG. 18. Due to sampling rate variation and variation ofthe message boundary within a message identification, it is expectedthat messages will be separated by samples indexes of multiples ofapproximately 16,000 samples when messages are repeated every 16,000samples.

Returning to FIG. 19, after the top message in the filter stack isselected (block 1802), the validator 1124 determines if the validationflag indicates that the message has been previously validated (block1804). For example, FIG. 19 indicates that message M0 has beenvalidated. When the message has been previously validated, the validator1124 outputs the message (block 1812) and control proceeds to block1816.

When the message has not been previously validated (block 1804), thevalidator 1124 determines if there is another suitably matching messagein the filter stack (block 1806). A message may be suitably matchingwhen it is identical to another message, when a threshold number ofmessage symbols match another message (e.g., four of the seven symbols),or when any other error determination indicates that two messages aresimilar enough to speculate that they are the same. According to theillustrated example, messages can only be partially validated withanother message that has already been validated. When a suitable matchis not identified, control proceeds to block 1814.

When a suitable match is identified, the validator 1124 determines if atime duration (e.g., in samples) between identical messages is proper(block 1808). For example, when messages are repeated every 16,000samples, it is determined if the separation between two suitablymatching messages is approximately a multiple of 16,000 samples. Whenthe time duration is not proper, control proceeds to block 1814.

When the time duration is proper (block 1808), the validator 1124validates both messages by setting the validation flag for each of themessages (block 1810). When the message has been validated completely(e.g., an exact match) the flag may indicate that the message is fullyvalidated (e.g., the message validated in FIG. 19). When the message hasonly been partially validated (e.g., only four of seven symbolsmatched), the message is marked as partially validated (e.g., themessage partially validated in FIG. 19). The validator 1124 then outputsthe top message (block 1812) and control proceeds to block 1816.

When it is determined that there is not a suitable match for the topmessage (block 1806) or that the time duration between a suitablematch(es) is not proper (block 1808), the top message is not validated(block 1814). Messages that are not validated are not output from thevalidator 1124.

After determining not to validate a message (blocks 1806, 1808, and1814) or outputting the top message (block 1812), the validator 1816pops the filter stack to remove the top message from the filter stack.Control then returns to block 1802 to process the next message at thetop of the filter stack.

While example manners of implementing any or all of the example encoder102 and the example decoder 116 have been illustrated and describedabove one or more of the data structures, elements, processes and/ordevices illustrated in the drawings and described above may be combined,divided, re-arranged, omitted, eliminated and/or implemented in anyother way. Further, the example encoder 102 and example decoder 116 maybe implemented by hardware, software, firmware and/or any combination ofhardware, software and/or firmware. Thus, for example, the exampleencoder 102 and the example decoder 116 could be implemented by one ormore circuit(s), programmable processor(s), application specificintegrated circuit(s) (ASIC(s)), programmable logic device(s) (PLD(s))and/or field programmable logic device(s) (FPLD(s)), etc. For example,the decoder 116 may be implemented using software on a platform device,such as a mobile telephone. If any of the appended claims is read tocover a purely software implementation, at least one of the prior codedetector 204, the example message generator 210, the symbol selector212, the code frequency selector 214, the synthesizer 216, the inverseFFT 218, the mixer 220, the overlapping short block maker 240, themasking evaluator 242, the critical band pair definer 702, the frequencydefiner 704, the number generator 706, the redundancy reducer 708, theexcess reducer 710, the code frequency definer 712, the LUT filler 714,the sampler 1102, the stacker 1104, the stacker control 1106, the timedomain to frequency domain converter 1108, the critical band normalize1110, the symbol scorer 1112, the max score selector 1114, thecomparator 1116, the circular buffers 1118, the pre-existing code flagcircular buffers 1120, the message identifier 1122, the validator 1124,and the symbol to bit converter 1126 are hereby expressly defined toinclude a tangible medium such as a memory, DVD, CD, etc. Further still,the example encoder 102 and the example decoder 116 may include datastructures, elements, processes and/or devices instead of, or inaddition to, those illustrated in the drawings and described above,and/or may include more than one of any or all of the illustrated datastructures, elements, processes and/or devices.

FIG. 20 is a schematic diagram of an example processor platform 2000that may be used and/or programmed to implement any or all of theexample encoder 102 and the decoder 116, and/or any other componentdescribed herein. For example, the processor platform 2000 can beimplemented by one or more general purpose processors, processor cores,microcontrollers, etc. Additionally, the processor platform 2000 beimplemented as a part of a device having other functionality. Forexample, the processor platform 2000 may be implemented using processingpower provided in a mobile telephone, or any other handheld device.

The processor platform 2000 of the example of FIG. 20 includes at leastone general purpose programmable processor 2005. The processor 2005executes coded instructions 2010 and/or 2012 present in main memory ofthe processor 2005 (e.g., within a RAM 2015 and/or a ROM 2020). Theprocessor 2005 may be any type of processing unit, such as a processorcore, a processor and/or a microcontroller. The processor 2005 mayexecute, among other things, example machine accessible instructionsimplementing the processes described herein. The processor 2005 is incommunication with the main memory (including a ROM 2020 and/or the RAM2015) via a bus 2025. The RAM 2015 may be implemented by DRAM, SDRAM,and/or any other type of RAM device, and ROM may be implemented by flashmemory and/or any other desired type of memory device. Access to thememory 2015 and 2020 may be controlled by a memory controller (notshown).

The processor platform 2000 also includes an interface circuit 2030. Theinterface circuit 2030 may be implemented by any type of interfacestandard, such as a USB interface, a Bluetooth interface, an externalmemory interface, serial port, general purpose input/output, etc. One ormore input devices 2035 and one or more output devices 2040 areconnected to the interface circuit 2030.

Although certain example apparatus, methods, and articles of manufactureare described herein, other implementations are possible. The scope ofcoverage of this patent is not limited to the specific examplesdescribed herein. On the contrary, this patent covers all apparatus,methods, and articles of manufacture falling within the scope of theinvention.

What is claimed is:
 1. A method to extract identifiers from mediacontent, the method comprising: receiving a media content signal;sampling the media content signal to generate digital samples; storingthe samples in a buffer; determining a first sequence of samples in thebuffer; determining a second sequence of samples in the buffer, whereinthe second sequence of samples is of substantially equal length as thefirst sequence of samples; calculating an average of the first sequenceof samples and the second sequence of samples to generate an averagesequence of samples; extracting an identifier from the average sequenceof samples; and storing the identifier in a tangible memory.